Separating text strings into a table of individual words in SQL via XML.

Comments 0

Share to social media

Nearly nine years ago, Mike Rorke of the SQL Server 2005 XML team blogged ‘Querying Over Constructed XML Using Sub-queries‘. I remember reading it at the time without being able to think of a use for what he was demonstrating. Just a few weeks ago, whilst preparing my article on searching strings, I got out my trusty function for splitting strings into words and something reminded me of the old blog.

I’d been trying to think of a way of using XML to split strings reliably into words. The routine I devised turned out to be slightly slower than the iterative word chop I’ve always used in the past, so I didn’t publish it. It was then I suddenly remembered the old routine. Here is my version of it.

I’ve unwrapped it from its obvious home in a function or procedure just so it is easy to appreciate. What it does is to chop a text string into individual words using XQuery and the good old nodes() method. I’ve benchmarked it and it is quicker than any of the SQL ways of doing it that I know about. Obviously, you can’t use the trick I described here to do it, because it is awkward to use REPLACE() on 1…n characters of whitespace. I’ll carry on using my iterative function since it is able to tell me the location of each word as a character-offset from the start, and also because this method leaves punctuation in (removing it takes time!). However, I can see other uses for this in passing lists as input or output parameters, or as return values.

…which gives (truncated, of course)…

Clip2.jpg

Load comments

About the author

Phil Factor

See Profile

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 40 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career. See also :

Phil Factor's contributions